A Large-Scale Multilingual Disambiguation of Glosses

نویسندگان

José Camacho-Collados

Claudio Delli Bovi

Alessandro Raganato

Roberto Navigli

چکیده

Linking concepts and named entities to knowledge bases has become a crucial Natural Language Understanding task. In this respect, recent works have shown the key advantage of exploiting textual definitions in various Natural Language Processing applications. However, to date there are no reliable large-scale corpora of sense-annotated textual definitions available to the research community. In this paper we present a large-scale high-quality corpus of disambiguated glosses in multiple languages, comprising sense annotations of both concepts and named entities from a unified sense inventory. Our approach for the construction and disambiguation of the corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system; first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation, and then we combine it with a semantic similarity-based refinement. As a result we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we make it freely available at http://lcl.uniroma1.it/disambiguated-glosses. Experiments on Open Information Extraction and Sense Clustering show how two state-of-the-art approaches improve their performance by integrating our disambiguated corpus into their pipeline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending, Trimming and Fusing WordNet for Technical Documents

This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the w...

متن کامل

Senseval-3 task: Word Sense Disambiguation of WordNet glosses

The SENSEVAL-3 task to perform word-sense disambiguation of WordNet glosses was designed to encourage development of technology to make use of standard lexical resources. The task was based on the availability of sensedisambiguated hand-tagged glosses created in the eXtended WordNet project. The hand-tagged glosses provided a “gold standard” for judging the performance of automated disambiguati...

متن کامل

A gloss-centered algorithm for disambiguation

The task of word sense disambiguation is to assign a sense label to a word in a passage. We report our algorithms and experiments for the two tasks that we participated in viz. the task of WSD of WordNet glosses and the task of WSD of English lexical sample. For both the tasks, we explore a method of sense disambiguation through a process of “comparing” the current context for a word against a ...

متن کامل

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EUROSENSE, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl p...

متن کامل

Integrating Conceptual Density with WordNet Domains and CALD Glosses for Noun Sense Disambiguation

The lack of large, semantically annotated corpora is one of the main drawbacks of Word Sense Disambiguation systems. Unsupervised systems do not need such corpora and rely on the information of the WordNet ontology. In order to improve their performance, the use of other lexical resources need to be investigated. This paper describes the effort to integrate the Conceptual Density approach with ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1608.06718 شماره

صفحات -

تاریخ انتشار 2016

A Large-Scale Multilingual Disambiguation of Glosses

نویسندگان

چکیده

منابع مشابه

Extending, Trimming and Fusing WordNet for Technical Documents

Senseval-3 task: Word Sense Disambiguation of WordNet glosses

A gloss-centered algorithm for disambiguation

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

Integrating Conceptual Density with WordNet Domains and CALD Glosses for Noun Sense Disambiguation

عنوان ژورنال:

اشتراک گذاری